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ABSTRACT 

We present the galaxy two-point angular correlation function for galaxies selected 
from the seventh data release of the Sloan Digital Sky Survey. The galaxy sample was 
selected with r-band apparent magnitudes between 17 and 21; and we measure the 
correlation function for the full sample as well as for the four magnitude ranges; 17-18, 
18-19, 19-20, and 20 21. Wc update the flag criteria to select a clean galaxy catalog 
and detail specific tests that we perform to characterize systematic effects, including 
the effects of seeing. Galactic extinction, and the overall survey uniformity. Notably, 
we find that optimally we can use observed regions with seeing < l'/5, and r-band 
extinction < 0.13 magnitudes, smaller than previously published results. Furthermore, 
we confirm that the uniformity of the SDSS photometry is minimally affected by 
the stripe geometry. We find that, overall, the two-point angular correlation function 
can be described by a power law, u){9) = A^6^^~'^^ with 7 ~ 1.72, over the range 
0?005 10°. We also find similar relationships for the four magnitude subsamples, but 
the amplitude within the same angular interval for the four subsamples is found to 
decrease with fainter magnitudes, in agreement with previous results. We find that the 
systematic signals are well below the galaxy angular correlation function for angles less 
than approximately 5°, which limits the modeling of galaxy angular correlations on 
larger scales. Finally, wc present our custom, highly parallelized two-point correlation 
code that wc used in this analysis. 
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1 INTRODUCTION 

One of the most powerful and simplest probes of the galaxy 
distribution is the two-point angular correlation function, 
which quantifies the excess probability above a random dis- 
tribution of finding one galaxy within a specified angle of 
another galaxy. For the case of a Gaussian random field, 
the two-point angular correlation function and its Legendre 
transform pair provide a complete statistical characteriza- 
tion of the galaxy clustering (see, e.g., Peebles 1980). Even 
for the case of non-Gaussianity, the two-point angular cor- 
relation function provides a simple and important statistical 
test of galaxy formation models (Tegmark et al. 2004). 

The two-point angular correlation function has been 

studied at bright magnitudes from the data releases from the 
Sloan Digital Sky Survey (SDSS) such as the Early Data Re- 
lease (EDR; Connolly et al. 2002). This data release covered 
a few hundred square degrees of the in sky, and the two- 
point galaxy angular correlation function was calculated on 
scales from a few arc seconds to a few degrees. The mea- 
sured correlation functions from the EDR were consistently 
found to obey a power law, w(^) = A^O^^ '\ where 7 ~ 1.7 
on small scales, with a break at 2°, beyond which the cor- 



relation dropped more steeply (Connolly ct al. 2002). For 
deeper surveys, the power law relation of the small-scale 
correlation function held, with the amplitude decreasing at 

fainter magnitudes (Connolly ct al. 2002). 

While these early SDSS results have provided a nice 
description of the angular clustering of galaxies, they only 
covered a relatively small area of the sky. In this paper, we 
present the measurement of the SDSS DR7 galaxy two-point 
angular correlation function. The SDSS DR7 galaxy sample 
covers nearly 10'' square degrees of the sky and includes ap- 
proximately 10* galaxies to a median redshift of 0.22. Fur- 
thermore, in comparison to the SDSS EDR, the data pro- 
cessing techniques of the SDSS DR7 have been greatly im- 
proved (Abazajian et al. 2004, 2009). The DR7 thus provides 
better image quality and photometric calibrations, with less 
severe systematic effects; and will, therefore, provide a more 
robust measurement of the galaxy angular clustering than 
previous large scale surveys. 

To accurately calculate the galaxy two-point angular 
correlation function, wc must first minimize potential sys- 
tematic effects in the galaxy catalog used to measure the 
correlation function. The systematics of the SDSS EDR 
were thoroughly studied by Scranton et al. (2002). To min- 
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imize the systematic effects of seeing and Galactic extinc- 
tion, they determined that the SDSS EDR galaxy sample 
had to be masked to exclude regions with seeing greater 
than l'.'75 and reddening > 0.2 magnitudes. Given the im- 
portance of minimizing the impact of systematic effects on 
the galaxy two-point angular correlation function and the 
significant changes that were made in the SDSS data pro- 
cessing pipehne between the SDSS EDR and the SDSS DR7, 
we have repeated many of the tests presented in Scranton 
et al. (2002) by using the SDSS DR7 data. In this paper 
we present the methods used to contain these systematic ef- 
fects, the results of these systematic tests, the actual galaxy 
two-point angular correlation function for the SDSS DR7, 
and our massively parallel implementation that rapidly cal- 
culates correlation functions for large data sets. 

In this paper, we first discuss the data and data sam- 
ples in §2, and we quantify the magnitude and source clas- 
sification completeness limits in §3. After detailing our test- 
ing of the effects of different systematics and determining 
the optimal cuts to minimize their effects in §4, we present 
the angular correlation function of galaxies and sub-samples 
split into magnitude bins in §5. Next, we discuss our fast, 
tree-based correlation function code that we used to quickly 
calculate two-point angular correlation functions for these 
large data sets in §5.3. Finally, we discuss these results and 
offer conclusions in 56. 



2 THE DATA 




Figure 1. Top: The full, primary data from the SDSS DR7. Bot- 
tom: The same data, but now showing only galaxies that are 
further cut to the theoretical SDSS footprint; restricted by obser- 
vational flags and masked holes; and color-coded to indicate their 
SDSS stripe. 



The SDSS was a photometric and spectroscopic survey 
conducted by the Astrophysical Research Consortium at the 
Apache Point Observatory in New Mexico that was primar- 
ily designed to produce a data set to map large scale struc- 
ture in the universe. The telescope was instrumented with 
either a wide-field, multi-band CCD camera or dual fiber- 
fed spectrographs. Cumulatively, the SDSS imaged over one- 
quarter of the entire sky, providing photometric informa- 
tion in five bands: u, g, r, i, and z (Fukugita et al. 1996). 
The data release studied herein, SDSS DR7, was released 
in November 2008, and includes objects observed through 
August 2008 (Abazajian et al. 2009). 

The main survey was centered on the north Galactic 
pole and was imaged in 37 interlaced stripes. Each stripe, 
which was observed during two days between the years 1999- 
2008 is 2? 5 wide, and the two ends of each stripe extend to 
low Galactic latitudes. The surveyed area includes a con- 
tinuous portion in the northern Galactic hemisphere (34 
stripes) and three individual stripes observed repeatedly in 
the southern Galactic hemisphere. In total, the data cover 
approximately 10'' deg^ of the sky and consist of angular 
positions for around 10* galaxies to a 5a detection limit of 
r ~ 23.1 (York et al. 2000). 

The photometric calibration was carried out by a sepa- 
rate 0.5-m photometric telescope adjacent to the SDSS main 
2.5-m telescope (Photometric Telescope; Gunn et al. 2006). 
A set of 157 standards stars, which covered the entire range 
in right ascension of the survey, were calibrated to the SDSS 
filter system (Smith et al. 2002), and the main telescope ob- 
served these primary standards every night to quantify the 
relevant atmospheric extinction. 



2.1 The Main Galaxy Sample 

The full data from the SDSS DR7 are shown in the top 
panel of Figure 1, which contains galaxies and stars observed 
between March 1999 and August 2008. The complete pro- 
cedure required to go from the SDSS data archive to our 
final galaxy sample is detailed in Appendix A; in this sec- 
tion we provide an overview of this process. Starting from 
the results of an SDSS CAS query that selected all objects 
with dereddened g, r, or i magnitudes < 23.0, we first cut 
this sample to mask regions containing bright stars located 
within our Galaxy, and subsequently cut the remaining data 
to the theoretical footprint provided by the SDSS (e.g., My- 
ers et al. 2007). Next, we restrict the sample to consist of all 
sources that pass the appropriate flag tests as indicated by 
the SDSS project to select an observationally clean sample 
(the specific cuts used are described at http://www.sdss. 
org/DR7/products/catalogs/f lags .html in the section en- 
titled Clean sample of galaxies and in Appendix A3) that 
consists of stars and galaxies. 

After restricting the data in the aforementioned man- 
ner, the data still include blank regions that lie within the 
survey area (see, e.g.. Figure 2 for several examples). To 
simplify the process of masking these regions, we utilize the 
official survey A/ry coordinates^ and manually check each 
stripe. Once an area of missing data is visually located, we 
identify the corners of the region bounding the missing data 
to an accuracy of 0.1 degrees to further mask the affected re- 

' http : //www. sdss . org/dr7/glossary/#survey_coords 
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Figure 2. Representative example areas in the SDSS DR7 foot- 
print with missing data. 

gion. As quantified in §3, we identify galaxies in this sample 
by using the SDSS type parameter, and limit the entire sam- 
ple to have extinction corrected r-band magnitudes within 
the range 17 < r < 21, as specifically justified by the results 
presented in §3.2. 

While the SDSS data set have been homogenized to 
the fullest extent possible, the data were observed in stripes 
that are each approximately 2?5 wide and of variable length 
(the stripes used in our angular correlation function analy- 
sis range from approximately 105° to 130° along the SDSS 
A coordinate). We select galaxies both from the northern 
Galactic hemisphere, which is a contiguous area of thirty- 
four stripes, and the southern Galactic hemisphere, which 
has only three stripes. In the bottom panel of Figure 1, we 
present our final galaxy sample, color-coded to indicate the 
SDSS stripe to which they belong. 

In the end, our data cover ~ 8,000 deg^ of the sky. The 
final galaxy sample we analyze (i.e., 17 < r < 21) contains 
nearly 22 million galaxies with a median rcdsliift of z = 0.21. 
To quantify the dependence on magnitude of our galaxy two- 
point angular correlation measurements, we split the full 
galaxy sample by magnitude into four sub-samples: 17 < 
r < 18 (~ 0.8 million galaxies), 18 < r < 19 (~ 2.5 million 
galaxies), 19 < r < 20 (~ 7.2 million galaxies), and 20 < 
r < 21 (~ 19.3 million galaxies). 

2.2 Stripe 82 Coadd Data 

While the SDSS data have been carefully reduced and 
calibrated, we still need to quantify the limiting magnitude 
of the main sample for cosmological analyses. To identify 
this magnitude limit, we need to compare the SDSS data to 
a deeper, more complete data set over as wide an area as 
possible. While several options exist for making this com- 
parison, in the end we selected to use the coadded Stripe 



82 data produced by the SDSS Legacy Survey that were 
also published as part of the SDSS DR7 (Abazajian et al. 
2009). While not as deep as other possible data sets, these 
data have the benefit of being taken with the same tele- 
scope and instrument as the main SDSS DR7 photometric 
data. And, after the coaddition of the individual observa- 
tions, these data were reduced with the same data process- 
ing software stack (Annis et al. 2011), thereby minimizing 
any systematic differences between the main and test data 
sets. 

The SDSS Legacy Survey was a 3-year extension of the 
original SDSS that began operations in July 2005 and com- 
pleted in July 2008. This legacy survey contains data from 
both the SDSS-I and SDSS-II projects, and covers more than 
7,500 square degrees of the northern Galactic hemisphere 
and 740 square degrees of the southern Galactic hemisphere. 
One of the primary science drivers for the SDSS-II project 
was to detect and measure light curves for a large number 
of supernovae (Frieman et al. 2008). As a result, the SDSS 
southern equatorial stripe 82 was repeatedly imaged dur- 
ing this survey extension during the months of September, 
October, and November (i.e., the three months when this 
stripe could be observed at the lowest airmass) in each of 
the three years: 2005-2007 (Abazajian et al. 2009). In the 
interest of constructing dense light curves for variable su- 
pernovas, these photometric data were acquired even when 
conditions were non-optimal. 

The SDSS has released 123 runs that cover the Stripe 82 
footprint^ , which have been observed under variable seeing, 
sky brightness, and photometric conditions. The best runs 
have been coadded by the SDSS collaboration to produce a 
final Stripe 82 coadded catalog, in which any given region 
has been observed between 20 and 40 times. Thus, the final 
Stripe 82 coadded catalog is nearly two magnitudes deeper 
than a single SDSS observation (Annis et al. 2011), and 
covers an area 2?5 wide and ~ 110° long, ranging from —50° 
to 60° in right ascension (as this is an equatorial stripe, 
right ascension is approximately equivalent to A, which is 
the survey longitude coordinate). As a result, we use these 
coadded Stripe 82 data to define the completeness limits of 
the main DR7 sample, which is discussed in Section 3. 

We selected the deeper, coadded data covering the 
Stripe 82 footprint by following the same procedures used 
for the main galax;y sample, but now applied to the SDSS 
CAS Stripe 82 Catalog^. Specifically, we first use the same 
query specified in Appendix Al to select the Stripe 82 coad- 
ded data, after which we cut these data to the Stripe 82 
footprint as described in Appendix A2, and we finally se- 
lect clean detections by employing the flag cuts as described 
in Appendix A3. This produces a sample of ~8.4 million 
sources from the Stripe 82 coadded data (hereafter 'coadd'). 
In the same manner, we also select sources (both galaxies 
and stars) from the full DR7 catalog that lie within the 
Stripe 82 footprint (hereafter 'main sample'), which consists 
of ~4.3 million sources. 



http : //www . sdss . org/dr7/coverage/sndr7 . html 
^ http://cas.sdss.org/stripe82/ 
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Table 1. The percentage of matched sources between the Stripe 
82 main sample and the Stripe 82 coadd data, split into all 
galaxies, all stars, and galaxies and stars in the magnitude range 
17 <r< 21. 



r-band model 


Galaxies 


Stars 


Galaxies 


Stars 


magnitude difference 






17 < r 


< 21 


0.1 


42.3% 


73.8% 


63.3% 


95.1% 


0.2 


64.0% 


88.7% 


81.2% 


98.8% 


0.5 


89.9% 


98.7% 


93.8% 


99.7% 


1.0 


98.2% 


99.8% 


97.9% 


99.8% 



3 COMPLETENESS LIMITS 

When making cosmological measurements from the full 
SDSS DR7 data, wc wish to be as inclusive as possible 
while minimizing any systematic effects. By using the SDSS 
EDR data, which were denoted by starred magnitudes (e.g., 
r*) as opposed to the final unstarrod magnitudes (e.g., r), 
Scranton et al. (2002) suggested that r* < 22 was sufficient. 
A later analysis of the SDSS EDR data by Infante et al. 
(2002), however, suggested a brighter limit of r* < 20.5 was 
more appropriate. In addition, a subsequent SDSS analysis 
demonstrated that the photometric pipeline used to process 
the SDSS EDR data incorrectly produced a 0.2 magnitude 
offset (Abazajian et al. 2004), which was corrected in later 
data releases. As a result, before addressing any other spe- 
cific systematic effects, we must first identify the magnitude 
range over which large-scale photometric analyses can be 
reliably performed with the SDSS DR7 data. This requires 
that we cross-match the main sample data to the deeper, 
coadd data within the Stripe 82 footprint. 



3.1 Cross-Matching Between Catalogs 

When matching sources between two surveys, there are typ- 
ically two restrictions that can bo used to correctly iden- 
tify the same source in both surveys. The first restriction is 
the use of a distance limit to force matched sources to be 
physically close on the sky, while the second restriction is a 
magnitude limit that forces matched sources to have simi- 
lar measured fluxes. In our case, we are matching between 
two surveys that use the same telescope, imaging camera 
and data reduction pipeline, with the only real difference 
being that the coadd data are measured from an image that 
results from the combination of a large number of observa- 
tions taken in varying conditions over a number of different 
years. Thus we felt that while our matching algorithm must 
employ a small distance tolerance for a successful match, we 
did not feel a magnitude restriction was appropriate. 

As a result, to match objects between the main and the 
coadd samples, we only imposed a distance limit of 0'.'56, 
which is the approximate diagonal size of an SDSS cam- 
era pixel. Once the matching between the two surveys was 
completed, wc calculated the differences in the dereddened 
r-band model magnitudes between the matched sources, 
and tabulate the results in Table 1. Overall, approximately 
56.4% of the matched objects have a magnitude difference 



less than 0.1, and about 75.1% have a magnitude difference 
less than 0.2, although it is also clear that galaxies show 
considerably larger magnitude differences than their stellar 
counterparts. 

After exploring this issue in more detail, primarily by 
visually inspecting a number of matched sources with large 
magnitude differences, we have found three primary rea- 
sons for the relatively high number of sources with larger 
than expected magnitude differences. First, the observa- 
tions used to construct the coadd image were talcen over a 
number of years, allowing for source photometric variability 
to induce magnitude differences. Second, the coadd image 
extends fainter than a standard, single pass SDSS image, 
and will, therefore, have a lower background sky level. This 
means that the SDSS processing pipeline will probe to a 
lower surface brightness, which can result in a change in 
the measured size of a galaxy and thus its model magni- 
tude. Finally, the deeper coadd image will also contain more 
sources, which will lead to crowding issues that can com- 
plicate both source deblending and pixel assignment. These 
will also both change the measured size of a gala.xy and thus 
its model magnitude. As a result of these effects, we feel con- 
fident in the use of this cross-matched catalog to determine 
a suitable magnitude limit for our main sample data. 

3.2 Magnitude Limit 

After constructing the cross-matched catalog, we first look 
to identify the magnitude limit we must impose on the main 
sample data. To do this, we use the deeper coadd data as 
a guide to indicate where the main sample becomes in- 
complete. To quantity this limit, we divide the Stripe 82 
footprint into 10 chunks. Within each of these chunks, we 
compute the fraction of sources in the coadd data that are 
matched to sources in the main sample in bins of width 0.2 
magnitudes. Since not all coadd data are matched, this pro- 
cess begins by using the coadd r-band, dereddened model 
magnitudes. 

By combining the matched fractions within a given 
magnitude bin across all chunks, we obtain a distribution 

that characterizes the detection completeness of the main 
sample as a function of the coadd r-band magnitude. We 
present the minimum, maximum, and median values of these 
distributions as the vertical error bars and square points, re- 
spectively, in the left plot of Figure 3. From this distribution, 
we see that the median value remains consistent with 90% 
completeness or better to a dereddened, coadd r-band model 
magnitude limit of r = 21. 

However, since we must apply this magnitude limit to 
the entire SDSS DR7 main galaxy sample, we need to com- 
pute the corresponding dereddened r-band model magnitude 
limit for the main sample. To do this, we take the distribu- 
tion of matched sources across all chunks within a given 
coadd magnitude bin, and compute the mean and standard 
deviation of the main sample magnitudes for all sources (we 
do exclude all non-detections from the main sample in this 
calculation). We present these values as the crosses and hor- 
izontal error bars in the left plot of Figure 3, which indicates 
that the same magnitude limit of r ~ 21 is appropriate for 
the main sample. We further confirmed this result by ver- 
ifying that the average difference between the dereddened 
r-band model magnitude for a main sample source and the 
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Figure 3. Left: The detection completeness of sources in the main sample as a function of their dereddened r-band model magnitude. 
The squares and vertical error bars show the median, minimum, and maximum fraction of matched sources between the coadd and 
main sample as a function of the coadd magnitude. The crosses and horizontal error bars are the mean and standard deviation of the 
main sample magnitudes for the matched sources, showing that the match fraction remains above 90% complete to r ^ 21 for the main 
sample. Right: The classification completeness and contamination of main sample galaxies as a function of their dereddened r-band 
model magnitude, showing that we are above 95% complete at r = 21. The completeness (contamination) is measured by identifying 
galaxies (stars) in the deeper, coadd data that are classified as galaxies in the main sample. The points indicate the median value, while 
the upper and lower limits correspond to the maximum and minimum values, respectively. 



same source in the coadd is consistent with zero, with an 
increasing dispersion to fainter magnitudes as expected. 



3.3 Star/Galaxy Classification 

The detection completeness is only one part of the picture, 
however, as we also must know the accuracy of the SDSS 
pipeline's source classification as a function of dereddened 
r-band model magnitude. To compute the classification com- 
pleteness, we repeat the analysis in the previous section, 
but now start with sources classified in the main sample as 
galaxies (i.e., type — 3). Specifically, we compute the frac- 
tion, within each chunk in bins of width 0.2 magnitudes, 
the fraction of main sample galaxies classified as galaxies in 
the deeper, coadd data (i.e., the classification completeness) 
and as stars in the deeper, coadd data (i.e., the classification 
contamination) . 

From these distributions, we compute the minimum, 
maximum, and median fractional values as a function of 
the main sample dereddened r-band model magnitude. We 
present these results in the right-hand panel of Figure 3, 
where the galaxy completeness is displayed in red and the 
stellar contamination is displayed in blue. In either case, 
the minimum and maximum fractional values are displayed 
as the error bars while the median values are shown as 
the points. From this figure, we see that our completeness 
is above 95% at our previously stated dereddened r-band 
model magnitude limit of r = 21, and in fact that source 
classification is reliable over the entire magnitude range of 
17 < r < 21. 



4 RESULTS OF SYSTEMATIC TESTS FROM 
DR7 

Scranton et al. (2002) performed a detailed analysis by us- 
ing the SDSS EDR to quantify possible systematic effects 
on clustering measurements that use the SDSS main galaxy 
sample. This work was leveraged repeatedly by subsequent 
authors, including to measure the galaxy two-point angular 
correlation function (Connolly et al. 2002) and the galaxy 
angular power spectrum (Tegmark et al. 2002). With later 
SDSS data releases, new constraints for either Galactic ex- 
tinction or seeing were adopted, as predicated by a correla- 
tion function (Ross et al. 2006) or an angular power spec- 
trum (Hayes et al. 2012). More recently, Ross et al. (2011) 
have performed a detailed analysis of the effects of system- 
atics in the SDSS DR8 on the clustering of luminous red 
galaxies, in particular finding that stars have become more 
problematic in this newer data release. As a result, in this 
section we perform a detailed study of different systemat- 
ics effects in the SDSS DR7 main galaxy sample. We note 
that all of these tests are done in two-dimensions, and can, 
therefore, be applied to any angular measurement of a two- 
dimensional survey data set. 

4.1 Pixelisation 

In order to quantify certain discrete systematic effects, we 
must sample the galaxy distribution on similar physical 
scales as the relevant systematic effects. To accomplish this, 
we divide the relevant data into small pixels, or cells, and 
measure the fiuctuations of a particular systematic effect 
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(e.g., seeing or reddening) across this distribution of pix- 
els. Given the distinct scanning strategy of the SDSS sur- 
vey, a specialized, pseudo-rectangular, approximately equal- 
area pixelisation strategy was developed by Tegmark, Xu, 
and Scranton (SDSSPix*) that works in SDSS X/rj coordi- 
nates (Stoughton et al. 2002). As a result, we use SDSSPix 
to quantify the density of sources within the SDSS stripe- 
based geometry for all relevant systematic tests. 

SDSSPix has been used to measure the correlation func- 
tion for a pixelised SDSS sample (see, e.g., Scranton et al. 
2002; Ross et al. 2006), but Hayes et al. (2012) demon- 
strated that SDSSPix can bias a clustering measurement 
since the pixels are not the same size across a given stripe 
(the ratio of the pixel height to the pixel width decreases 
towards the ends of a stripe). As a result, we follow Hayes 
et al. (2012) and explore the use of a second pixelization 
scheme, HEALPix, to compute our pixelised correlation 
functions. HEALPix was developed by (Gorski et al. 2005) 
and works in any spherical coordinate system. HealPix cre- 
ates 12 equal-area curvilinearly base-patches, from which 
pixels are generated at higher resolutions with either a RING 
or NESTED numbering scheme. 

To decide which pixelisation scheme is optimal for our 
systematic tests, we pixelate the SDSS DR7 with both 
schemes, using SDSSPix at resolution 320 and HEALPix at 
resolution 2048 (these resolutions produce equal area pix- 
els: 3.10 square arcminutes for SDSSPix and 2.95 square 
arcmintutes for HEALPix). We compute the two-point an- 
gular correlation function for the SDSS DR7 data by us- 
ing the point-to-point method described in §5.3 and the 
pixel based method described in §4.5.1. The results from 
all three methods are directly compared in the top panel 
of Figure 4, while the bottom panel compares the ratio of 
the pixel based methods to the point-to-point method. From 
this figure, in particular the ratio plot in the bottom panel, 
we see that SDSSPix systematically underestimates the cor- 
relation function, which becomes more severe at smaller an- 
gles (we believe this is a manifestation of the changing pixel 
shape). As a result, we adopt the HEALPix scheme for all 
pixel based systematic correlation function tests. 



4.2 Density Fluctuations Among Stripes 

The SDSS survey observed data along great circles, which 
are known as stripes that are identified by their stripe num- 
ber. To explore the effects of this observing strategy on the 
uniformity of the full galaxy sample, we examined the uni- 
formity of the galaxy counts, including as a function of mag- 
nitude, across these different stripes. For this test, we first 
used the SDSS algorithm to cut the full sample into the 
thirty-seven constituent stripes present in the SDSS DR7 
data'^ . 

Since all of these stripe observations were deemed to be 
photometric, we expect that star-galaxy classification (see, 
e.g., §3.3) should be consistent across all stripes. To verify 
this assumption, we measured the galaxy density for each 
of the thirty-four stripes we use in subsequent analyses (i.e.. 



^ http: //dls .physics .ucdavis . edu/~scraiiton/SDSSPix/ 

http: //cas . sdss . org/dr7/en/help/docs/algorithm. asp? 
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Figure 4. Top: A comparison between the pixel-based (HEALPix 
resolution 2048 and SDSSPix resolution 320) and the point-to- 
point based pair count methods used in this paper. Bottom: The 
ratio of the above pixel-based correlation to the point-to-point 
based correlation. The errors are calculated by propagation of 
jackknife errors in quadrature. 



the northern stripes 9-39, and southern stripes 76, 82, 86). 
The mean galaxy density we find for the total galaxy den- 
sity is 3324.0 galaxies per square degree with large density 
fluctuations within each stripe, while the variation between 
the different stripes are also significant. Similar patterns are 
found for the four magnitude subsamples: 17 < r < 18, 
18 < r < 19, 19 < r < 20, and 20 < r < 21, with galaxy 
density 88.9, 278.9, 808.9, 2147.4 galaxies per square degree 
respectively. One concern for these significant fluctuations 
would be that some fraction of these stripes have system- 
atic effects. To test this hypothesis, we repeat this analysis 
by using the main galaxy sample further restricted to ar- 
eas of both good seeing and minimal Galactic extinction as 
derived in Section 4.5. 
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Figure 5. Left: A box plot of the galaxy number density for each SDSS DR7 stripe (enumerated along the horizontal axis) restricted to 
areas of both good seeing and minimal reddening values, as defined in §4.5, showing the median and the 25"^ and 75**^ percent quartiles. 
The dotted line shows the mean galaxy density derived from the full main sample, which is 3493.4 galaxies/square degree. The light 
yellow region shows the one sigma Poissonian variation. Right: The same box plot now divided into four magnitude ranges: 17 < r < 18, 
18 < r < 19, 19 < r < 20, and 20 < r < 21, along with their respective mean galaxy densities (shown as the dotted line) as derived from 
the full main sample, which are 92.8, 292.3, 849.2, and 2259.0 galaxies per square degree, respectively. 



We present our results in Figure 5, a box plot of the 
galaxy density for each of these thirty-four stripes. In this 
type of plot, the upper and lower edges of the box indicate 
the 75% and 25% quartiles of the distribution and the central 
line indicates the median value. In this figure, the left-hand 
panel shows the total galaxy density for a given stripe which 
has been restricted to areas of both good seeing and minimal 
Galactic extinction. Overplotted as a dotted line is the mean 
galaxy density across the entire main sample, along with the 
one-sigma range (assuming Poissonian fluctuations), which 
is shown by the yellow bar. The right-hand panel presents, 
in a similar manner, the galaxy number density as a function 
of SDSS stripe for four magnitude subsamples: 17 < r < 18, 
18 < r < 19, 19 < r < 20, and 20 < r < 21. 

These results indicate that the corrections made by 
our seeing and reddening cuts are significant. They produce 
number density distributions that show less variation be- 
tween stripes and smaller fluctuations within each stripe, 
and the number densities are higher than the unmasked 
data. The small variations between the different stripes re- 
flects the variation in the clustering pattern of galaxies 
across the sky (note that we explicitly present the clustering 
difference between stripes in Figure 12 in §4.6). In addition, 
these variations are generally consistent with random fluc- 
tuations, both in the individual magnitude ranges and the 
full main sample. As a result, these two systematic effects, 
seeing and Galactic extinction, do induce systematic signals 
that can be removed from our galaxy sample by using the 
appropriate restrictions. Since these restrictions remove the 
vast majority of the data from stripes 42, 43, and 44, in 
the end we simply remove these stripes entirely from the 
clustering analyses of the main galaxy sample. 

4.3 Seeing Variations 

To determine the seeing as a function of spatial location, we 
use the effective area of the point-spread function for each 



measured survey field to determine the relevant seeing val- 
ues''. As described in §4.5.1, we pixelate the entire SDSS 
DR7 footprint by using SDSSPix at resolution 128, and as- 
sign each pixel the appropriate stripe number, the A/r; coor- 
dinate of the pixel center, and the relevant seeing and red- 
dening values. We present the calculated seeing values as a 
function of lambda (i.e., the SDSS longitude coordinate) for 
each stripe in the SDSS DR7 northern contiguous region and 
the three separate southern stripes in Figure 6. The bottom 
panel contains the contiguous northern hemisphere stripes 
9-39, while the top panel contains the northern stripes 42- 
44 and the three southern stripes: 76, 82, and 86. Overall, 
the seeing for all stripes generally remains fairly smooth, as 
expected, with most seeing values below l'.'5. By using this 
as a canonical value, only stripe 43 was observed primarily 
in less than ideal conditions. 

In general, we want to both minimize the effect of a 
systematic on our clustering measurements while maximiz- 
ing the number of sources (or equivalently observed area) 
available for analyses. Using the pixelised SDSS DR7 map, 
we calculate the survey area as a function of seeing, which 
we display as the differential area in the top panel of Fig- 
ure 7, and as the cumulative area in the middle panel of Fig- 
ure 7. From this figure, we see that pixels with seeing values 
smaller than l'.'2 contain approximately half of the total ob- 
served area, while the pixels with seeing values smaller than 
l'.'5 contain almost the entire observed area. As a result, the 
majority of the survey area will be retained by using a seeing 
cut between l'.'2 and l'.'5. 

Next, we explore how the galaxy number density de- 
pends on the seeing. To obtain these values, we augment 
our pixelised SDSS DR7 map with the galaxy density for 
each pixel. We plot the binned, differential galaxy number 
counts as a function of seeing in four magnitude ranges in 



http : //www . sdss . org/dr7/ algorithms/masks . html 
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Figure 6. A heat map showing the average seeing values as a 
function of the SDSS lambda (i.e., longitude) coordinate for all 
thirty-seven stripes in the SDSS DR7. The bottom panel shows 
the northern hemisphere stripes 9-39, while the top panel shows 
stripes 42-44, and the southern hemisphere stripes: 76, 82, and 
86. For convenience, the three southern hemisphere stripes are 
shifted in lambda to align with northern stripes. The final value 
we use to remove the systematics from seeing is indicated in the 
colorbar at the bottom of the figure with a vertical magenta line. 



the bottom panel of Figure 7. In this figure, the galaxy den- 
sities have large fluctuations at small seeing values while 
this fluctuation quickly decreases as we include more area. 
By looking at this flgure in conjunction with the differential 
area flgure in the top panel, we can see that the galaxy den- 
sity at low seeing values oscillates due to the small number 
of pixels with very small seeing values. Likewise, we see that 
the increase in the variation of the galaxy number density 
at higher seeing values occurs since there are few pixels with 
higher seeing values. 

As shown in the flgure, the galaxy number density de- 
creases at large seeing values. At seeing value of ~ 1'.'5, the 
differential galaxy number density is 80% of the density at 
smaller seeing. This decrease can be understood since as 
the seeing increases, star/galaxy classification becomes more 
difficult due to the atmospheric blurring of the source light 
profiles. This effect decreases the galaxy number counts in 
each pixel; and, therefore, decreases the overall galaxy den- 
sity. By adopting differential galaxy number densities higher 
than 80%, this figure indicates that a maximum seeing cut 
at 1'.'5 should be used; however, the exact value to be used 
is best determined by a cross-correlation measurement as 
discussed in Section 4.5. 



4.4 Reddening Variations 

Galactic extinction (or reddening) systematically dims ob- 
jects, and the spatial distribution of the dust that causes this 
obscuration within our Galaxy varies across the sky. Thus, 
to determine an acceptable limit for this systematic effect, 
we follow a similar procedure to the one outlined in the 
section 4.3 where we pixelate the sky (as described in Sec- 
tion 4.5.1). In this case, however, we start by using the red- 
dening map of Schlegel et al. (1998) to quantify the Galac- 
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Figure 7. Top: The differential unmasked area as a function of 
seeing. Middle: The cumulative unmasked area to the total sur- 
vey area as a function of seeing. Bottom: The differential galaxy 
number density as a function of seeing. The four horizontal lines 
are the mean densities of the full sky coverage for four magnitude 
bins from the right panel of Figure 5. The error bar are Poisso- 
nian fluctuations in each seeing bin. The vertical dot line shows 
for the seeing cut that we use for our final galaxy catalog. 



tic extinction as a function of the SDSS lambda coordinate 
for each stripe in the SDSS DR7 northern hemisphere and 
the three separate southern stripes, as shown in Figure 8. 
These two observed regions are centered near the northern 
and southern Galactic poles, which are both regions of low 
Galactic extinction. We, therefore, expect a priori that all of 
these stripes should generally have higher reddening values 
at their endpoints in comparison to their midsection, which 
is the trend that is generally seen in Figure 8. 

As discussed in §4.3, we want to maximize the retained 
survey area, while minimizing the effects of the systematic, 
in this case Galactic extinction, on our clustering measure- 
ments. Using this pixelised reddening map, we calculate the 
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Figure 8. A heat map showing the average reddening values as 
a function of the SDSS lambda (i.e., longitude) coordinate for all 
thirty-seven stripes in the SDSS DR7. The bottom panel shows 
the northern hemisphere stripes 9-39, while the top panel shows 
stripes 42-44, and the southern hemisphere stripes: 76, 82, and 
86. For convenience, the three southern hemisphere stripes are 
shifted in lambda to align with northern stripes. The final value 
we use to remove the systematics from reddening is indicated in 
the colorbar at the bottom of the figure with a vertical magenta 
line. 



survey area as a function of reddening, which we display as 
the differential area in the top panel of Figure 9, and as the 
cumulative area in the middle panel of Figure 9. From this 
figure, we see that pixels with reddening values less than 0.1 
include nearly 75% of the survey area, while reddening val- 
ues less than 0.2 include nearly all of the survey. As a result, 
the majority of the survey area can be maintained by using 
a reddening cut between 0.1 and 0.2. 

Next, we explore how the galaxy number density varies 
with Galactic extinction. We plot the binned galaxy num- 
ber density as a function of reddening in four magnitude 
ranges in the bottom panel of Figure 9. For all magnitude 
ranges, the scatter in the distribution increases for redden- 
ing values larger than 0.2, indicating that there are few pix- 
els with reddening values in this range. On the other hand, 
at small reddening values, i.e., below 0.1 magnitudes, the 
galaxy density increases as the reddening value increases. As 
the value increases, the amount of survey area included also 
increases, and we eventually reach a nearly steady galaxy 
density around a reddening value of 0.1. We therefore con- 
clude that we will want to make a reddening cut somewhere 
between 0.1 and 0.2, but once again we will quantify the 
exact value by using a cross-correlation measurement as dis- 
cussed in Section 4.5. 
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Figure 9. Top: The differential unmasked area as a function of 
reddening. Middle: The cumulative unmasked area to the total 
survey area as a function of reddening. Bottom: The differential 
galaxy number density as a function of reddening. Similar as Fig- 
ure 7, the four horizontal lines are the mean densities of the full 
sky coverage for four magnitude bins from the right panel of Fig- 
ure 5. The error bar are Poissonian fluctuations in each seeing 
bin. The vertical dot line shows for the reddening cut that we use 
for our final galaxy catalog. 



4.5 Cross- Correlations - Galaxy Density against 
Seeing and Reddening 

In the previous two subsections, we determined the opti- 
mal ranges for the values of both seeing and Galactic ex- 
tinction that would minimize their systematic effects on our 
correlation measurements. In this section, we now focus on 
determining the actual values for each of these systematic ef- 
fects, which we accomplish by calculating the galaxy-seeing 



and galaxy-reddening cross correlation functions. To mea- 
sure these correlation functions, we first pixelate the sky so 
we can calculate the pixel cross-correlation function as de- 
scribed in Section 4.5.1. Ideally, we can identify a systematic 
value that produces a flat cross correlation function that is 
consistent with zero on both small and large scales. In prac- 
tice, some residual will remain; therefore we measure the 
cross-correlation function for different values of each sys- 
tematic in order to find the optimal value. 
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Figure 10. Left: The galaxy-seeing cross-correlation functions for 17 < r < 21. The bolded black square points and error bars represent 
the preferred seeing cut of I'.'S. Right: The galaxy-reddening cross-correlation functions for 17 < r < 21. The bolded black square 
points and error bars represent the preferred reddening cut of 0.13. The error bars in two panels are typical for the correlation functions 
calculated by using the other seeing or reddening values. 



4-5.1 Cross-Correlation Function Estimators 

To determine the optimal data sample for our analysis, we 
need to quantify the specific data cuts we employ to mini- 
mize systematic effects on our measurement. In particular, 
we wish to minimize the effects of seeing and Galactic ex- 
tinction, or reddening. As demonstrated by Scranton et al. 
(2002), this can be accomplished by measuring the two-point 
angular cross-correlation function between galaxies and the 
relevant systematic. As both reddening and seeing are not 
observed as continuous quantities, however, we must first 
pixelate the sky by using the HEALPix scheme as described 
in Section 4.1. The main caveat with this approach is that 
to measure cross-correlations for these systematics, we must 
adopt pixels that are smaller than the characteristic scale 
of the observed systematic effect. Because each SDSS scan 
line has approximately 0?21 in width and 160° in length, we 
expect the systematic effect due to seeing to be bounded by 
the width of a single SDSS scan line, which should also be 
less than the image frame size (the frame size is described 
at http://www.sdss.org/dr7/instruinents/imager/, and 
is about 0.0337 square degrees). The reddening values pub- 
lished by the SDSS are derived from the Schlegel et al. (1998) 
maps, which have an even larger pixel size. Thus, the min- 
imum pixel area we use for our cross-correlation measure- 
ments must be less than the image frame size, or 0.0337 sq. 
deg. As a result, we use HEALPix resolution 2048 to pixe- 
late the SDSS DR7 data, which corresponds to a pixel size 
of 0.00082 square degrees. 

We next compute both the number of galaxies and the 
mean seeing and reddening values for each pixel. Follow- 
ing Scranton et al. (2002), we divide the entire SDSS DR7 
data into 10° x 10° subsamples, and measure the mean num- 
ber density of galaxies per pixel and the mean systematic per 
pixel for each of these subsamples. Using the galaxy counts 



and mean systematic values, we calculate the over/under 
density for both the number of galaxies and the systematic 
for each pixel i within a specific subsample: 



vf - v" 



(1) 



where is the galaxy number density (indicated by g) for 
pixel i, and u| is the mean value of the systematic being 
quantified (e.g., seeing or reddening, indicated by s) for pixel 
i. and v" are the mean galaxy number density per pixel 
and the mean value of the specific systematic for the given 
subsample, respectively. 

By using these pixelised quantities, we use the follow- 
ing estimator to calculate the angular cross-correlation of 
galaxies against a specific systematic quantity: 



(2) 



If the distance between i and j are within the given 9 bin, 
Oij is equal to one, otherwise it is zero. The estimator is 
calculated between 0°.05 and 10°, with a logarithmic scale of 
30 angular bins. Once the estimator has been calculated for 
all subsamples, we calculate the mean estimator < u!{0) > 
and the error from all subsamples by using the following 
equation: 



1 ^ 



(3) 



where N ~ 100, which is how many subsamples we use in 
this measurement. 
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4.5.2 Results 

In the left panel of Figure 10, we present the galaxy-seeing 
cross-correlation function for the full sample over the mag- 
nitude range 17 < r < 21. We calculated the pixel cross- 
correlation function for seeing values between I'.'O and 2'.'0 in 
steps of O'.'l, but only show the five correlation functions for 
clarity (the other samples show similar trends). This figure 
indicates that seeing cuts at or smaller than l'.'5 have min- 
imal systematic effects as the cross-correlation function is 
mostly consistent with zero, especially at large scales. Since 
a seeing cut of l'.'5 keeps more than 90% of the survey data 
while minimizing the contamination cross-correlation signal, 
we choose l'.'5 as the final value of our seeing cut. Figure 11 
indicates that this signal is much less than the galaxy auto- 
correlation function measurement Ld{9) on all scales, from 
0?05 to ~ 5°. 

Likewise, in the right panel of Figure 10, we present the 
galaxy-reddening cross-correlation function for the full sam- 
ple over the magnitude range 17 < r < 21. The reddening 
cross-correlation function is calculated for both magnitude 
samples from 0.1 to 0.2 magnitudes in intervals of 0.01 mag- 
nitudes. However, for clarity only the five correlation func- 
tions are shown (again, the others follow similar trends). 
The reddening cuts are all consistent with zero within 3a at 
small scales and within 1 cr at large scales. We are especially 
interested in the large angle cross-correlation function val- 
ues (~ 5°, where the reddening cross-correlation signal is of 
similar scale to the galaxy correlation). Therefore, we choose 
the reddening cut that has the smallest value at 5°, while 
also keeping the majority of the survey area. As a result, we 
select 0.13 magnitudes to be the upper limit for our allowed 
reddening value, which keeps more than 80% of the data. 
We note that as shown in Figure 11, the galaxy-reddening 
cross-correlation signal is well below the value of the galaxy 
auto-correlation function until around 5°. 

We also measure the cross-correlation functions for both 
galaxy-seeing and galaxy-reddening in the four magnitude 
bins, and find similar trends with the full sample. We also 
measure the galaxy-star cross-correlation function, which 
is below the galaxy auto-correlation function until ~ 5°. 
In Figure 11, we show the ratio of the pixelized galaxy- 
seeing, galaxy-reddening, and galaxy-star cross-correlation 
functions to the pixelized galaxy autocorrelation function. 
We find ~ 5° is the scale where both the reddening and star 
galaxy cross-correlation functions become comparable in 
magnitude with the galaxy auto-correlation function, while 
the galaxy-seeing cross-correlation is always well below the 
galaxy signal with small error bars. We discuss the galaxy- 
star cross-correlation function in more detail in §6. 



4.6 Correlation Function Among Stripes 

Having applied the systematic cuts for reddening and seeing, 
we now complement the technique discussed in Section 4.2 to 
verify the uniformity of our final galaxy sample across SDSS 
stripes. We measure the galaxy auto-correlation functions 
by using the point-to-point technique discussed in §5.3 for 
each individual stripe to quantify the stripe-to-stripe fiuctu- 
ations. In Figure 12, we present a box-whisker plot for the 
galaxy auto-correlation functions of the thirty-one north- 
ern stripes 9-39, and three southern stripes: 76, 82, and 86. 
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Figure 11. The galaxy-galaxy auto-correlation function com- 
pared to the galaxy-reddening, galaxy-seeing, and galaxy-star 
cross-correlation functions for galaxies and stars with magnitudes 
in the range 17 < r < 21 with seeing < 1'.'5 and reddening 
< 0.13. These systematic signals are well below the galaxy auto- 
correlation function until ~ 5°. 
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Figure 12. A box-whisker plot of the stripe galaxy angular cor- 
relation functions in the magnitude range 17 < r < 21 for the 
thirty-one northern stripes 9—39, and three southern stripes: 76, 
82, and 86. 

In this type of plot, the box shows the span of the cen- 
tral 50% of the data while the whiskers show the minimum 
and maximum limits of the data (in this case the galaxy 
auto-correlation function across all stripes at a given angu- 
lar resolution). 

As indicated by the whiskers, there are some variations 
across the different stripes, which is expected since the clus- 
tering of galaxies will vary across the sky. We see exactly this 
type of variation in the density of galaxies across the same 
SDSS stripes as shown in Figure 5. Taken together, these re- 
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suits provide evidence that our final, masked galaxy sample 
is sufficiently uniform across the specified SDSS footprint 
for our angular correlation analysis. 



5 THE SDSS GALAXY ANGULAR 
CORRELATION FUNCTION 

5.1 The Angular Correlation Function Estimator 

After concluding the systematic tests and defining the fi- 
nal galaxy sample as detailed in Section 4, we next focus 
on measuring the clustering of the galaxy sample by us- 
ing the two-point galaxy angular correlation function. The 
two-point galaxy angular correlation function calculates the 
excess probability over a random distribution that given one 
galaxy at a specific location, another galaxy will be found 
within a specific angular distance (Peebles 1980) . Given such 
a probabilistic definition, it is not surprising that to deter- 
mine this function we require a large number of random 
points. Therefore, we construct a large random sample of 
galaxies (the total number of random points used in any 
measurement is always at least ten times the size of the in- 
dividual galaxy sample being analysed) that both lie within 
the SDSS theoretical footprint and that are also restricted 
to areas of the sky that satisfy the systematic cuts discussed 
in the Section 4.5. 

With these random points, we measure the two-point 
galaxy correlation function by using the Landy & Szalay 
(1993) estimator: 



Ndd - 2Ndr + iVrr 
Nrr 



(4) 



where Ndd is the normalized number of galaxy-galaxy pairs 
counted within a given angular separation bin oi6±59 (e.g., 
over the entire SDSS DR7 galaxy sample), and Ndr and 
Nrr are the normalized number of galaxy-random pairs and 
random-random pairs, respectively. Unless stated otherwise, 
we calculate the two-point galaxy angular correlation func- 
tion in thirty angular bins, spaced logarithmically between 
0?005 and 10°. 



5.1.1 Different Correlation Estimators 

Besides the Landy & Szalay (1993) estimator presented in 
the previous section, we have explored three other estimators 
for the two-point angular correlation function. First, we have 
tried the original Peebles estimator (Peebles & Hauser 1974): 



DD/Nl - RR/Nl 



(5) 



Second, we have tried a similar estimator developed by Davis 
& Peebles (1983): 



oj{6)dp = 



DD/Np - DR/Nr 
DR/Nr • 



(6) 



Finally, we have tried the following estimator developed 
by Hamihon (1993): 



DD * RR- DR* DR 



(7) 



DR^DR 

In all three of the above equations, DD is the galaxy-galaxy 
pair counts within the given angular bin, while DR and RR 
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Figure 13. The ratio of the Peebles & Hauser (1974) estimator 
(open squares), the Davis & Peebles (1983) estimator (crosses), 
and the Hamilton (1993) estimator (filled triangles) to the Landy 
& Szalay (1993) estimator. The errors for each estimator are cal- 
culated by using 32 jackknife resamplings. 



represent the bin counts of galaxy-random pair and random- 
random pair, respectively. Likewise, Nd and N_r are the total 
number points in the galaxy sample and random sample and 
are used to properly normalize the appropriate pair count. 

We compare these three estimators with the stan- 
dard Landy & Szalay (1993) estimator in Figure 13. As has 
been shown previously (Kerscher et al. 2000), the Hamilton 
(1993) estimator is in close agreement with the Landy & 
Szalay (1993) estimator, but has slightly larger error bars. 
On the other hand, both the Peebles & Hauser (1974) and 
the Davis & Peebles (1983) overestimate the galaxy cluster- 
ing at small scales and have larger error bars over all scales 
than the Landy & Szalay (1993) estimator. As a result, we 
will utilize the Landy & Szalay (1993) estimator as appro- 
priate throughout this paper. 



5.2 Errors and Curve Fitting 

To calculate the errors on our two-point galaxy angular cor- 
relation function measurements, we adopt the 'delete one 
jackknife' method. We subdivide our full galaxy sample into 
32 sub-samples. By leaving one subsample out, we calculate 
the two-point galaxy angular correlation function for the 
data in the remaining thirty-one subsamples. This allows 
us to construct a jackknife defined covariance matrix that 
both quantifies the homogeneity of our galaxy sample and 
also allows us to optimally model-fit our correlation function 
measurements. 

The covariance matrix for the N = 32 jackknife samples 
is determined by using the formula presented by Scranton 
et al. (2002), but see also Zehavi et al. (2002); Myers et al. 
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Figure 14. Left: Computational time for tlie calculation of the two-point galaxy auto-correlation function by using our specialized code 
as a function of the number of galaxies. The fitted line shows the runtime scales with N^-^^. Right: The computational time as a function 
of the number of processors, for approximately 2 million galaxies. The line shows the ideal runtime as processors increases. 



(2005, 2007): 



32 



where u) is the value from the full galeixy sample, the 
refers to the value of the correlation measurement obtained 
by omitting the fc**" subsample of data, and i and j are re- 
spectively the i*^ and j**" angular bins. The jackknife bin 
errors, Ui, can be obtained from the diagonal elements of 
the covariance matrix, i.e.. 



Ci.i • 



(9) 



For comparison with previous works (e.g., Connolly 
et al. 2002), we fit our two-point galaxy angular correlation 
measurements with a power-law model: U!m{6) = Au,d^^~''^ . 
To determine the best fit model for each correlation function, 
we perform a chi-squared minimization (Press 2002): 

1 



X = 



N, 



dof 



^HOi) - UJmmC-jHei) - UJm{0j)], (10) 



where uj{d) is the measured two-point galaxy angular corre- 
lation function, ojm (0) is the model two-point galaxy angular 
correlation function, and d.j arc the elements of the calcu- 
lated covariance matrix from Equation 8. 



5.3 The Fast Two-Point Correlation Function 
Calculation 

Historically, the two-point correlation function (hereafter 
2PCF) has been limited by the availability of large data 
sets with sufficient sky coverage and depth to provide a 
fair sample of objects in the Universe. We now live in a 
privileged era when such data sets are or will be available 
thanks to current or planned large-scale surveys such as the 



SDSS, the Dark Energy Survey (DES), or the Large Syn- 
optic Survey Telescope (LSST). With millions or possibly 
billions of unique objects, the traditional methods of calcu- 
lating the 2PCF become entirely unfeasible as calculation 
times quickly reach years or longer. Wo therefore present a 
technique that leverages a two-dimensional quad-tree struc- 
ture to speed up these calculations. Detailed discussion of 
this technique can bo found in (Dolcnce & Brunner 2008), 
and we make our implementation freely available'^. 



5.3.1 The Methodology 

Fundamentally, the calculation of the 2PCF involves deter- 
mining how many pairs of data points lie within particular 
distance bins as compared to a Poisson distribution. For a 
data set with N points, the naive approach calls for the cal- 
culation of the distance from each point to all other N — 1 
points. Clearly, this approach leads to a computational load 
that scales as 0{N'^), which proves impractical for largo A^. 
One can improve the calculation by organizing the data into 
a two-dimensional quad-tree structure that groups nearby 
data points (Moore ct al. 2001). This technique gives signif- 
icant savings in practice, as instead of calculating the dis- 
tance to every jjoint, one can often account for entire groups 
of points by looking only at the bounding boxes of the dif- 
ferent groups. 

We first perform preliminary calculations for all galax- 
ies and random points that convert and organize the data 
into an optimized format for subsequent calculations. The 
pre-computing codes are all serial, as this step in the 2PCF 
calculation is relatively inexpensive. Next, we construct the 
quad-tree for both the galaxies and random points by using 



^ http : //Icdm . astro . Illinois . edu/code 
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Figure 15. Left: The two-point angular correlation function for galaxies in tlie magnitude range 17 < r < 21. Right: The two-point 
galaxy angular correlation function split by magnitude: 17 < r < 18 (red), 18 < r < 19 (magenta), 19 < r < 20 (blue), and 20 < r < 21 
(green). Overplotted for all four correlation function measurements are the best fit power laws: uj{9) = Ai^S'^"^' , the individual fit values: 
Au, and7 are given in Table 3. In both plots we draw a line at ui{d) = 0.00053, which is the typical scale of the maximum systematic 
contamination (i.e., Galactic extinction ) at large angles (see, e.g.. Figures 10 and 11). 



a modified fcd-tree (Bentley 1975). This produces a balanced 
tree with minimal depth that both minimizes the memory 
required to store the tree and leads to an efficient tree traver- 
sal in later computations. 

The algorithm proceeds by computing the minimum 
size bounding box that contains all the data (i.e., root node), 
which is subsequently subdivided in a recursive manner into 
new subsamples (i.e., child nodes). We quantify the mini- 
mum size bounding box for each subsample, and this pro- 
cess continues until a child node contains fewer data points 
than a preset limit. We also consider the jackknife resam- 
pling when building the trees since we must be sure to select 
data and random points that occupy the same volume for a 
given sample for each jackknife. 

Given the tree structure above, we must be able to 
quickly determine the minimum and maximum angular sep- 
arations between two nodes or a point and a node. Since we 
have stored the cosine and sine of the angular size of each 
node as well as their centers, the requisite information can 
be computed without ever using a trigonometric function 
evaluation. 

To parallelize this algorithm, we note that the compari- 
son of two data sets represented by their trees can be broken 
into subproblems by comparing all nodes at a given level L 
in one tree with the root node of the other. Since we use 
binary trees, this yields 2^ subproblems that can be dis- 
tributed to multiple processors. We employ a master-slave 
arrangement where one processor is responsible for coordi- 
nating the parallel calculation and the remaining processors 
make requests for work as needed. When not handling work 
requests the master process performs smaller amounts of 
work by descending deeper into the tree, which assures that 
it frequently checks for work requests but is not idle when no 



requests have been posted. In the current implementation, 
all processors have direct access to all the data which limits 
communication to single integer tags identifying particular 
subproblems. 

5.3.2 The Performance 

To demonstrate the performance and scaling of this imple- 
mentation, we ran each correlation function ten times and 
compute the mean time and the standard deviation of the 
ten separate calculations. Figure 14 shows how the code 
scales with an increasing number of galaxies by using only 
one processor (left panel) and with an increasing number of 
processors by using two million galaxies (right panel). For 
each galaxy sample, we use random data with ten times more 
points in the same sky region and we compute the angular 
correlations to 10°. The left plot in Figure 14 shows that 
the runtime scales with N'^ , where a ~ 1.35. The right plot 
shows how the running time scales with number of proces- 
sors, with 2 million points in the galaxy sample. These two 
plots illustrate that the parallel algorithm we present above 
computes the 2PCF efficiently over a wide range of angles 
for large data sets. 

5.4 The Angular Correlation Function of DR7 
Galaxies 

By applying the correlation function estimator in Equation 4 
to the full galaxy sample, as defined by the restrictions out- 
lined in Section 4, we measure the two-point galaxy angular 
correlation function for the SDSS DR7. The resulting corre- 
lation function is shown in the left-hand panel of Figure 15. 
By calculating the full sample correlation matrix estimator 
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Table 2. The two-point galaxy angular correlation function measurements for the full galaxy sample and our four magnitude limited 
sub-samples: 17 < r < 18, 18 < r < 19, 19 < r < 20, and 20 < r < 21) 



Angle(deg) 17 < r < 21 17 < r < 18 18 < r < 19 19 < r < 20 20 < r < 21 



0, 


.006 


0, 


.3173 


± 


0, 


.0045 


1 


.7583 


± 


0, 


.0797 


0, 


,8129 


± 


0, 


,0220 


0, 


,4498 


± 


0, 


,0074 


0, 


,2456 


± 


0, 


,0049 


0, 


.007 


0, 


.2548 


± 


0, 


.0040 


1 


.3099 


± 


0, 


.0559 


0, 


,6805 


± 


0, 


,0182 


0, 


,3705 


± 


0, 


,0059 


0, 


,1995 


± 


0, 


,0047 


0, 


.009 


0, 


.2093 


± 


0, 


.0034 


1, 


.0344 


± 


0, 


.0448 


0, 


,5634 


± 


0, 


,0149 


0, 


,3041 


± 


0, 


,0055 


0, 


,1646 


± 


0, 


,0043 





.012 


0, 


.1718 


± 


0, 


.0031 


0, 


.8975 


± 


0, 


.0352 


0, 


,4582 


± 


0, 


,0112 


0, 


,2520 


± 


0, 


,0037 


0, 


.1349 


± 





.0040 


0, 


.016 


0, 


.1458 


± 


0, 


.0029 


0, 


.7198 


± 


0, 


.0325 


0, 


,3871 




0, 


,0113 


0, 


,2134 


± 


0, 


,0037 





.1166 


± 





.0038 


0, 


.020 


0. 


.1269 


± 


0. 


.0027 


0. 


.6310 


± 


0, 


.0260 


0, 


,3241 


± 


0, 


,0087 


0, 


,1824 


± 


0, 


,0038 


0, 


.1022 


± 


0, 


.0039 





.026 


0, 


.1106 


± 


0, 


.0025 


0, 


.5108 


± 


0, 


.0245 


0, 


,2733 


± 


0, 


,0093 


0, 


,1549 


± 


0, 


,0034 





.0904 


± 





.0036 


0, 


.033 


0, 


.0943 


± 





.0023 





.4274 







.0193 


0, 


,2296 




0. 


,0067 


0, 


,1328 


± 


0, 


,0032 


0, 


.0779 


± 


0, 


.0034 


0, 


.043 


0, 


.0794 


± 


0, 


.0021 


0, 


.3619 


± 


0, 


.0183 


0, 


,1957 


± 


0, 


,0068 


0, 


,1112 


± 


0, 


,0028 


0, 


.0660 


± 


0, 


.0032 


0, 


.056 


0, 


.0664 


± 


0, 


.0020 


0, 


.3019 


± 


0, 


.0146 


0, 


,1611 




0, 


,0061 


0, 


,0942 


± 


0, 


,0026 


0, 


.0554 







.0030 


0, 


.072 


0, 


.0552 


± 


0, 


.0018 


0, 


.2564 


± 


0, 


.0139 


0, 


,1385 


± 


0, 


,0052 


0, 


,0773 


± 


0, 


,0023 


0, 


.0463 


± 


0, 


.0028 


0, 


.092 


0, 


.0455 


± 


0, 


.0017 


0, 


.2091 


± 


0, 


.0114 


0, 


,1114 


± 


0, 


,0046 


0, 


,0642 


± 


0, 


,0021 


0, 


.0384 


± 


0, 


.0025 


0, 


.119 


0, 


.0375 


± 


0, 


.0015 


0, 


.1723 


± 


0, 


.0102 


0, 


,0899 


± 


0, 


,0041 


0, 


,0525 


± 


0, 


,0018 


0, 


.0318 


± 


0, 


.0023 


0, 


.153 


0, 


.0306 


± 


0, 


.0013 


0. 


.1419 


± 


0, 


.0087 


0, 


,0726 


± 


0, 


,0035 


0, 


,0425 


± 


0, 


,0016 


0, 


.0260 


± 





.0020 





.197 


0, 


.0250 


± 


0, 


.0012 


0, 


.1133 


± 


0, 


.0074 


0, 


,0585 


± 


0, 


,0030 


0, 


,0346 


± 


0, 


,0015 


0, 


.0213 


± 





.0017 





.254 


0, 


.0209 


± 


0, 


.0011 


0, 


.0929 


± 


0, 


.0061 


0, 


,0480 


± 


0, 


,0027 


0, 


,0291 


± 


0, 


,0014 





.0178 


± 





.0014 


0, 


.327 


0, 


.0178 


± 


0, 


,0011 


0, 


.0781 


± 


0, 


.0054 


0, 


,0400 


± 


0, 


,0025 


0, 


,0249 




0, 


,0014 


0, 


.0153 


± 


0, 


.0011 


0, 


.421 


0, 


.0152 


± 


0, 


.0011 


0, 


.0642 




0, 


.0049 


0, 


,0330 


± 


0, 


,0022 


0, 


,0210 


± 


0, 


,0013 





.0131 


± 





.0010 


0, 


.543 


0, 


.0123 


± 


0, 


.0010 


0, 


.0531 


± 


0, 


.0042 


0, 


,0268 


± 


0, 


,0021 


0, 


,0171 


± 


0, 


,0012 





.0105 


± 





.0009 


0, 


.699 


0, 


.0097 


± 


0, 


.0009 


0, 


.0440 




0, 


.0036 


0, 


,0219 


± 


0, 


,0018 


0, 


,0136 


± 


0, 


,0011 


0, 


.0083 


± 





.0008 


0, 


.901 


0, 


.0078 


± 


0, 


.0008 


0, 


.0367 


± 


0, 


.0032 


0, 


,0179 


± 


0, 


,0016 


0, 


,0108 


± 


0, 


,0010 


0, 


.0066 


± 


0, 


.0007 


1, 


.161 


0, 


.0060 


± 


0, 


.0007 


0, 


.0294 


± 


0, 


.0029 


0, 


,0137 


± 


0, 


,0014 


0, 


,0084 


± 


0, 


,0008 


0, 


.0050 


± 


0, 


.0006 


1, 


.495 


0. 


.0046 


± 


0. 


.0006 


0. 


.0243 




0. 


.0027 


0, 


,0108 


± 


0, 


,0014 


0, 


,0064 


± 


0, 


,0007 


0, 


.0038 


± 





.0005 


1, 


.927 


0. 


.0035 


± 


0. 


.0006 


0. 


.0195 




0. 


.0024 


0, 


,0085 


± 


0, 


,0012 


0, 


,0048 


± 


0, 


,0007 


0, 


.0029 


± 


0, 


.0005 


2, 


.482 


0, 


.0026 


± 


0, 


.0005 


0, 


.0160 


± 


0, 


.0021 


0, 


,0063 


± 


0, 


,0012 


0, 


,0034 


± 


0, 


,0006 





.0021 


± 





.0004 


3 


.198 


0, 


.0018 




0, 


.0005 


0, 


.0124 


± 


0, 


,0020 


0, 


,0045 


± 


0, 


,0011 


0, 


,0023 




0, 


,0006 





.0015 


± 





.0004 


4, 


.120 


0, 


.0012 


± 


0, 


.0004 


0, 


.0080 




0, 


.0020 


0, 


,0029 


± 


0, 


,0010 


0, 


,0015 




0, 


,0006 


0, 


.0010 


± 





.0003 


5, 


.308 


0, 


.0010 


± 


0, 


.0004 


0, 


.0063 


± 


0, 


,0018 


0, 


,0023 




0, 


,0009 


0, 


,0011 


± 


0, 


,0005 


0, 


.0008 




0, 


.0003 


6, 


.838 


0, 


.0007 


± 


0, 


.0003 


0, 


.0052 


± 


0, 


,0017 


0, 


,0019 


± 


0, 


,0008 


0, 


,0010 


± 


0, 


,0004 


0, 


.0006 


± 


0, 


.0003 


8, 


.810 


0, 


.0006 


± 


0, 


.0003 


0, 


.0049 


± 


0, 


.0017 


0, 


,0019 


± 


0, 


,0007 


0, 


,0008 


± 


0, 


,0004 





.0005 


± 





.0003 



Table 3. Parameter values for the power-law model fits, for both 
the full galaxy sample and magnitude limited subsamples.) 



Magnitude 


logioAui 


1-7 


xVdof 


17 < r < 21 








{full sample) 


-2.120 ± 0.019 


-0.720 ± 0.010 


5.30 


17 < r < 18 


-1.483 ± 0.009 


-0.754 ± 0.006 


0.76 


18 < r < 19 


-1.776 ± 0.014 


-0.759 ± 0.008 


2.36 


19 < r < 20 


-1.983 ± 0.018 


-0.731 ± 0.010 


5.46 


20 < r < 21 


-2.222 ± 0.023 


-0.719 ± 0.012 


4.26 



shown in Equation 8, we obtain a model power law fit us- 
ing Equation 10, finding = —2.12 with 7 ~ 1.72 for 
the full galaxy sample, which is over plotted with the data 
shown in Figure 15. We present the full sample correlation 
matrix, which is computed from the covariance matrix (see, 
e.g., Scranton et al. 2002): 

r(9.,e,) = ^j:M^^ (11) 



The correlation matrix for the full galaxy sample is pre- 
sented in Figure 16, and is seen to be highly diagonal. We 
tabulate all covariance matrices in Appendix B. Overall, the 
amplitude of the correlation function is consistent with pre- 
vious results from surveys such as the APM (Maddox et al. 
1990) and the SDSS EDR (Connolly et al. 2002); a more 
detailed comparison is presented in Section 6. 

By following this same procedure, we measure the two- 
point galaxy angular correlation function for four magnitude 
limited samples: 17 < r < 18, 18 < r < 19, 19 < r < 20, 
and 20 < r < 21, which are shown in the right-hand panel of 

Figure 15. The actual two-point galaxy angular correlation 
mcEisurements are also presented in Table 2 for each angular 
bin. 

We fit these individual angular correlation functions, 

following the same technique as described for the full galaxy 
sample, but by using the appropriate magnitude range jack- 
knife covariance matrix. The best fit power-law models are 
overplotted on the relevant data in Figure 15, and the power- 
law fit parameters are tabulated for the full galaxy sample 
and each of the four magnitude limited samples in Table 2. 
We find that the amplitudes of these four correlation func- 
tions are found to decrease with increasing magnitude, as 
expected since we are sampling intrinsically fainter galaxies 
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Figure 16. The correlation matrix for tiie full galaxy sample, 
pixel values range from (uncorrelated; white) to 1 (fully corre- 
lated; black). 

that are known to be clustered less strongly (Scranton et al. 
2002; Connolly et al. 2002). 



6 DISCUSSIONS AND CONCLUSIONS 

In this paper, we present the first, complete measurement 
of the two-point galaxy angular correlation function for the 
SDSS DR7. To make the most precise measurement possi- 
ble, we first perform a thorough reanalysis of possible exter- 
nal and internal sources of error. First, we found that the 
SDSS DR7 data have a detection completeness of approxi- 
mately 90% to a dereddended r-band model magnitude of 
21. Second, we demonstrated that the source classification 
is 95% complete to this same magnitude limit. Thus we re- 
strict dour galaxy sample to have r-band magnitudes in the 
range 17 < r < 21. Next, we confirmed the overall quality 
of the SDSS photometric data, and find that our signal is 
maximized by restricting the SDSS DR7 data to those re- 
gions of the survey that have seeing < I'.'b and reddening 
< 0.13 mag. With these sample restrictions, the majority of 
the data from stripe 42, 43 and 44 are removed; therefore, for 
simplicity we simply exclude these three stripes from final 
galaxy sample. 

We also explored the effect of the SDSS survey strategy 
on our measurement, finding that the variations in galaxy 
densities and Ld{0) across stripes are small,. Therefore, we 
see no reason to a priori exclude any of the remaining 
thirty-four stripes that constitute our final sample. Finally, 
we compared the two-point galaxy auto-correlation func- 
tion with the two-point cross-correlation function between 
galaxies and seeing, and between galaxies and reddening, 
finding that the amplitudes of these systematic errors are 
well below the measurement of iu{9) on angular scales from 
0?05 to 5°. From these measured systematics, we can sug- 



gest that, unless these systematic effects can be mitigated 
more effectively, the measurement of galaxy angular correla- 
tion functions from forthcoming large surveys, such as DES 
and LSST, will be limited to smaller angular scales as they 
will probe intrinsically fainter magnitudes. One method to 
mitigate this effect, however, will be to use photometric red- 
shifts to divide the angular signal into smaller redshift shells 
to minimize the projection effects in measuring ij{6) and 
thereby increase the amplitude of the overall signal. 

One result of our analysis of diflerent systematics was 
that stars do not play a major effect in the SDSS DR7. This 
is in direct confiict with the results of Ross et al. (2011), who 
demonstrated that stars are one of the dominant contami- 
nants in clustering measurements of luminous red galaxies in 
the SDSS DR8. These differences can be explained by several 
facts. First, we explore the effects of the different systemat- 
ics on all galaxies, not just luminous red galaxies, which 
are generally quite faint in the SDSS imaging data. Second, 
SDSS DR7 does not extend to the same low Galactic lati- 
tudes as SDSS DR8, which means DR8 will include regions 
of much higher stellar density. Third, we use more stringent 
swing and reddening cuts, which will reduce the overall sky 
coverage, preferentially to higher Galactic latitudes. Fourth, 
we use the official SDSS star/galaxy classification method, 
as opposed to the a separate neural-network classification. 
Finally, the SDSS DR8 has a known photometric issue that 
affects the measured color offsets as copared to the SDSS 
DR7 photometric pipeline (Ross et al. 2011). 

Our final measurement of the two-point galaxy angu- 
lar correlation function includes data taken through August 
2008, and demonstrates that both the shape and amplitude 
of the two-point galaxy angular correlation function are sim- 
ilar to (albeit more precise than) previous published results. 
We find that the correlation function can generally be de- 
scribed by a power law io(6) = A^O''^ '\ with 7 ~ 1.72 on 
both small and large scales. The amplitude of the correla- 
tion function decreases as a function of magnitude, which 
is also in good agreement with previous results, with 7 ~ 
1.75, 1.76, 1.73, and 1.72 for magnitude bins 17 < r < 18, 
18 < r < 19, 19 < r < 20, and 20 < r < 21. 

In Figure 17, we compare our galaxy angular correla- 
tion function amplitudes at = 1° for magnitude bins 17 < 
r < 18, 18 < r < 19, 19 < r < 20, and 20 < r < 21 with 
previous, published results made from other galaxy catalogs 
(note that we have made no effort to correct for the likely 
small, and unknown differences in the various r-band filters 
used by the different authors). At brighter magnitudes these 
catalogs include the SDSS EDR (Sloan Digital Sky Survey 
Early Data Release: Connolly et al. 2002; Gaztafiaga 2002) 
and UKST (UK Schmidt Telescope: Stevenson et al. 1985). 
While at fainter magnitudes, we compare to galaxy cata- 
logs from the AAT (Anglo- Australian Telescope: Jones et al. 
1991; Couch et al. 1993), Hale Telescope (Brainerd et al. 
1995), TS12 (Efstathiou et al. (1991)), CFHT (Canada- 
France-Hawaii Telescope: Hudon & Lilly 1996; Woods & 
Fahlman 1997; Infante & Pritchet 1995), INT (Isaac New- 
ton Telescope: Roche et al. 1993), the HDF (Hubble Deep 
Field: Villumsen et al. 1997); and the HDF-South (Teplitz 
et al. 2001). 

Overall, our measurement is quite precise, which is ex- 
pected given the uniformity of our data and the extremely 
large size of our galaxy sample. The top panel in Figure 17 
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Figure 17. The correlation function amplitude versus r-band 
magnitude at 6 = 1°. Top: The comparison of our result with 
the results from Connolly et al. (2002) and Gaztaiiaga (2002), we 
shift these two results by 0.2 magnitudes to account for the known 
SDSS EDR photometry problem, and Stevenson et al. (1985). 
Bottom: The comparison of these correlation amplitudes to cor- 
relation amplitudes measured from fainter galaxy catalogs (see 
text for details). Note that we have ignored the small differences 
between the various r-band filters used by these different authors. 



as close as one might think. However, we have shown that 
both the galaxy density and galaxy clustering strength are 
mildly stripe dependent (see, e.g.. Figures 5 and 12), thus it 
is not surprising that there would be differences between the 
single EDR stripe and our full, thirty-four stripe DR7 sam- 
ple. Finally, we note that our results agree with the general 
trend shown by previous results at fainter magnitudes. 

While we have presented the first, complete SDSS 
galaxy angular correlation measurement in this paper, there 
remains considerable work to do in this area. First, our 
analysis of the resulting correlation functions in this paper 
has followed the standard power-law clustering model (e.g., 
Brunner et al. 2000; Teplitz et al. 2001). These models are 
no longer as popular, in part since they do not fully capture 
the full nuances of the galaxy angular correlation function 
as measured from large, uniform data sets. We see this in 
the reduced values for our fits, which are 4.26 for the 
faintest magnitude function and 5.30 for the full sample. 
Newer approaches have been developed (e.g.. Brown et al. 
2008; Coupon et al. 2012) that allow stronger constraints 
to be placed on structure formation models. In addition, 
halo models (e.g., Zheng et al. 2005) have been developed 
for the interpretation of galaxy spatial clustering measure- 
ments, and these models have been extended to the analy- 
sis of angular clustering measurements when augmented by 
photometric redshifts (Ross et al. 2006, 2007). Thus, an in- 
teresting next step will be to extend our measurements pre- 
sented in this paper to use the SDSS photometric redshift 
estimates (Csabai et al. 2007) within these more advanced 
techniques (Coupon et al. 2012). 

In addition, the SDSS photometric redshift estima- 
tion process also provides a spectral type classification that 
can be used to divide the SDSS galaxy sample into early- 
and late-type galaxy samples (see, e.g., Budavari et al. 
2003). Previous efforts have used these type classifications 
along with the photometric redshift estimates to construct 
volume-limited galaxy samples (that can also be further sub- 
divided by galaxy type) to measure the evolution of the 
angular clustering of a volume-limited sample of galaxies 
via correlation functions (Ross & Brunner 2009; Ross et al. 
2010) and via the angular power spectrum (Hayes et al. 
2012). While this can easily be done with our current sample 
as a consistency check, a more interesting analysis would be 
to find photometrically classified galaxy pairs, triplets, and 
quads to explore their clustering behavior both in general 
and as a function of galaxy type and redshift. This would 
produce a new approach to the study of central and satel- 
lite galaxy distributions within halo occupation distribution 
models. 



shows the comparison with previous efforts that cover the 
same magnitude range as our data. For this figure, we have 
shifted the EDR measurements (Connolly et al. (2002) and 
Gaztanaga (2002)) by 0.2 magnitudes to account for the 
known SDSS EDR photometry error (Abazajian et al. 2004). 
In general, our amplitudes agree well with Stevenson et al. 
(1985), Gaztanaga (2002) and Jones et al. (1991). While 
our measured clustering strength is within one-sigma of the 
SDSS EDR results of Connolly et al. (2002), they are not 
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APPENDIX A: CREATING THE GALAXY 
SAMPLE 

In this Appendix, we detail the steps wo take to obtain our 
final galaxy catalog used in the analysis presented herein, 
starting with our original SQL query to the SDSS catalog 
archive server. 

Al The SDSS CAS SQL Query 

Our first step is to extract all relevant data from the SDSS 
catalog archive server. We do this by issuing the following 
SQL query: 

SELECT 

p.objID, p.ra, p. dec, 

p. type, p. flags, p. insideMask, 

First get PSF Mags 

p.psfMag_u, p.psfMagErr_u, 

p.psfMag_g, p.psfMagErr_g, 

p.psfMag_r, p.psfMagErr jr, 

p.psfMag_i, p.psfMagErr_i, 

p.psfMag_z, p.psfMagErr_z, 

— — Now get Model Mags 

p . modelMag_u , p . modelMagErr_u , 

p .modelMag_g, p.modelMagErr_g, 

p .modelMag_r , p.modelMagErr_r, 

p . modelMag_i , p . modelMagErr_i , 

p . modelMag_z , p . modelMagErr_z , 

— — Now get Petro Mags 

p .petroMag_u, p .petroMagErr_u, 
p . petroMag_g , p.petroMagErr_g, 
p . petroMag_r , p.petroMagErr_r, 



p . petroMag_i , p . petroMagErr_i , 
p . petroMag_z , p . petroMagErr_z , 

— — Now get Fiber Mags 

p.f iberMag_u, p.f iberMagErr_u, 
p.f iberMag_g, p.f iberMagErr_g, 
p . f iberMag_r , p.f iberMagErr _r , 
p . f iberMag_i , p.f iberMagErr _i , 
p . f iberMag_z , p.f iberMagErr _z , 

— — Get concentration parameters 

p.petroR50_r, p.petroR90_r, 

— — Get all extinction values 

p . extinction_u, p . extinction_g, 
p. extinctions, p . extinction_i , 
p . extinct ion_z , 

— — Get all type and flag information 

p.type.u, p.type_g, p.type_r, 
p.type_i, p.type_z, 

p.flags_u, p.flags_g, p.flags_r, 
p.flags_i, p.flags_z, 

— — Get Michigan Moments for seeing 

p.mRrCc_u, p.mRrCcErr_u, p.mRxCcPSF_u, 

p.mRrCc_g, p.mRrCcErr_g, p.mRrCcPSF_g, 

p.mRrCc_r, p.mRrCcErr_r, p.mRrCcPSF_r, 

p.mRrCc_i, p.mRrCcErr_i, p.mRrCcPSF_i, 

p.mRrCc_z, p.mRrCcErr_z, p.mRrCcPSF_z, 

— — Now get all photoz values 

z.z, z.zErr, z.chiSq, z.nnlslnside, z.pztype, 

z.dmod, z.kcorr_u, z.kcorr_g, z.kcorr_r, 
z.kcorr_i, z.kcorr_z, z.absMag_u, z.absMag_g, 
z.absMagjr, z.absMag_i, z.absMag_z, 

— — Now the Table join 

FROM PhotoPrimary AS p 

LEFT OUTER JOIN 

photoz AS z ON p.objID = z.objID 

— — Limit output to reasonable detections 
WHERE 

((p.dered_g < 23.0) OR 
(p.dered_r < 23.0) OR 
(p.dered.i < 23.0)) 



This request generates a catalog with more than 341 million 
point sources, including stars and galaxies. 
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A2 Cutting to the SDSS Theoretical Footprint 

The SDSS DR7 footprint is defined by all non-repeating, 
survey-quality imaging runs observed prior to July 2008, 
including the elliptical survey area in the Northern Hemi- 
sphere and the three stripes in the Southern Hemisphere. 
Starting with the catalog returned by our SQL query, wc 
first apply the SDSS theoretical footprint*. From this cut, 
we retain thirty-one stripes in the Northern Hemisphere: 9- 
39, and three stripes from the Southern Hemisphere: 76, 82, 
86. After these cuts, our catalog has ~ 214 million sources 
(i.e., wo keep 62.8% of the data from the original catalog), 
covering ~ 8200 square degrees of the sky, with ~ 7650 
square degrees of this in the Northern Galactic Cap high- 
latitude region and ~ 750 square degrees of the total from 
the three stripes in the Southern Galactic Cap. 

Furthermore, wc mask objects that are in any of the 
five image masks ^, and this keeps 94.6% of the above data, 
which results in 203 million objects. 

A3 Applying object detection and measurement 
flags 

For completeness, we detail the relevant SDSS photometric 
fiags in Table Al. In this section, we outline the method 
by which we follow the SDSS project recommendations^'^ to 
restrict our sample to clean, photometric detections by using 
the object flags assigned by the SDSS photometric pipeline. 
First we compute the following two meta-flags: 

DEBLEND_PROBLEMS = 
PEAKCENTER I I 
NOTCHECKED I I 

(DEBLENDJfOPEAK && psfErrjr > 0.2) 

INTER? ^PROBLEMS = 

PSF_FLUX_INTERP I I 
BAD_COUNTS_ERR I I 
(INTERP_CENTER && CR) 

which simplifies subsequent flag tests. For 
DEBLEND_PROBLEMS, if either PEAKCENTER or NOTCHECKED 
is set, or if psfErr_r is greater than 0.2 magnitudes and 
DEBLEND_NOPEAK is set, wc set the DEBLEND_PROBLEMS mcta- 
fiag. Likewise for INTERP_PROBLEMS, if thePSF_FLUX_INTERP 
or BAD_COUNTS_ERROR flags are set, or if INTERP.CENTER and 
CR are both set, we set the INTERP_PROBLEMS meta-flag. In 
the end, we only accept objects that pass the following, 
r-band flag test: 

BINNEDl && (SATURATED && INOPROFILE M 
! INTERP .PROBLEMS && ! DEBLEND .PROBLEMS && 
(lEDGE I I 

(EDGE M (NODEBLEND I I DEBLENDED JIT_EDGE) ) ) && 
! (DEBLENDED_AS_PSF && 

(TOG.FEW.GOGD.DETECTIGN I I NOPETRO)) 

Wc now briefly discuss our strategy with respect to the 
flags listed in Table Al. First, for the flags listed in the Object 
status flags section, we select objects that are BINNED but 



wc do not use the BRIGHT flag. Since we originally selected 
objects from the Primary catalog, which implies ! BRIGHT 
&& (! BLENDED I | NODEBLEND I I nchild == 0), we do not 
need to use the other flags listed in this section. Second, 
for the flags listed in the Raw data problem flags section, 
we select objects that are not SATURATED. For the EDGE flag, 
we choose the object that has no EDGE flag, or is close to 
frame EDGE and has either NODEBLEND or DEBLENDED_AT_EDGE 
set. Third, for the flags listed in the Image problem flags 
section, wo select objects that arc not NOPROFILE. Finally, 
for the flags listed in the Suspicious object flags section, we 
exclude objects that have flag DEBLENDED_ASJPSF and con- 
tains cither TOO_FEW_GOOD_DETECTION or NOPETRO. The last 
step is essential because all objects it removes are suspicious 
objects, which we have visually examined. 

To summarize the previous discussion, we select all 
objects that are detected in BINNEDl, are not flagged 
with either SATURATED, NOPROFILE, DEBLEND_PROBLEMS, or 
INTERP .PROBLEMS, and satisfy the EDGE criteria and removed 
the suspicious objects as discussed in the previous para- 
graph. Thus we exclude objects with interpolation problems, 
but do not cut on EDGE since large galaxies can cross SDSS 
fields. Overall, these flag cuts keep ~ 145 million sources, or 
71.5% of the data from the previous section. 

A4 Final sample selection 

Finally, following the discussion in Sections 3.2 and 3.3, we 
select only those objects with dereddened r-band magni- 
tudes between 17 and 21, and we exclude the objects with 
dereddened g- and i-band magnitudes fainter than 23. Our 
final cut is to choose galaxies by selecting those objects with 
typeJT = 3, which is the numerical value for galaxies. This 
produces a final galajcy catalog consisting of approximately 
26 million galaxies. 



APPENDIX B: FULL SAMPLE GALAXY 
COVARIANCE MATRIX 

In this Appendix, we present the full sample galaxy covari- 
ance matrix, as calculated from Equation 8 as described in 
Section 5. Given its size, we present a sample matrix for the 
full galaxy catalog in Tables Bl, the full version and the co- 
variance matrices for galaxy catalogs in four magnitude bins 
are available online in ASCII format. 



http : //www . sdss . org/drT/coverage/ index . html 

http : //www . sdss . org/ dr7/algorithms/masks . html 

' http: //www. sdss . org/dr7/products/catalogs/f lags .html 
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Table Al. This table briefly describes all flags that may affect the quality of the SDSSS imaging data. The percentage of data with 
each flag is based on the SQL query in Al. The flags that have and asterisk (*) appended can be set in single band. 



Name 


Description 


Data with 






this flag (%) 


INTERPJ'ROBLEMS: 






PSF_FLUX_INTERP* 


More than 20% of the PSF flux is from interpolated pixels. 


14.7 


BAD_COUNTS_ERR* 


The object contains many interpolation affected pixels, thus there are too few 







good pixels to estimate a PSF error. 




INTERP_CENTER 


The interpolated pixel is within 3 pixels of the object center. 


9.28 


COSMICRAY (CR) 


The object contains cosmic rays. 


12.3 


DEBLEND .PROBLEMS: 






PEAKCENTER* 


The object uses the position of the peak pixel as its center. 


0.549 


NOTCHECKED 


The object contains pixels that were not checked to see if they include local peaJcs. 


1.20 


DEBLEND_NOPEAK 


The object is a CHILD but has no peak in at least one band. 


11.9 


psfErrjr 


PSF flux error in r-band. 


28.2 


Object status flags: 






BINNEDl* 


Ihe object was detected at > 5<t m a 1 x 1 bmned image. 


97.2 


BINNED2^ 


The object was detected in a 2 x 2 binned image. 


2.97 


BINNED4^ 


The object was detected in a 4 x 4 binned image. 


0.105 


DETECTED 


The object was cither detected in BINNEDl, BINNED2, or BINNED4. 


99.8 


BRIGHT 


The object was duplicate-detected at > 200a, which usually means r < 17.5. 





BLENDED* 


The object was detected with multiple peaJfs, and thus there was an attempt to deblend 


9.15 




it as a parent object. 




HODEBLEND 


The object was BLENDED, but there was no attempt to deblend it. 




CHILD 


The object was the result of deblending a BLENDED object. It may still be BLENDED. 


26.1 


Raw data problem flags: 






SATURATED* 


Ihe object contams one or more saturated pixels. 


4.42 


EDGE 


The object is too close to the edge of a field frame. 


0.432 


LOCAL JDGE 


Similar to EDGE, but one half of the CCD failed. 





DEBLENDEDJITJDGE 


The object is so large that it is marked as EDGE in all fields 


0.687 




and strips, and thus it is deblended anyway. 




INTERP 


The object contains one or more interpolated-over pixels. 


12.1 


MAYBE.CR 


The object may be a cosmic ray. 


1.33 


MAYBE_EGHOST 


The object may be a ghost produced by CCD electronics. 


0.143 


Image problem flags: 






CANONICAL_CENTER* 


The measurements use the center in the r-band rather than the local band. 





NOPROFILE* 


The object is either too small or too close to the edge and thus it is hard to estimate 







the radial flux profile. 




NGTCHECKED.CENTER 


Similar to NOTCHECKED, but the aifected pixels are close to object's center. 





TOG_LARGE 


The object is either too large to measure its profile or has a child greater than half of 


2.64E-6 




tlic franic. 




BADSKY 


rhv local sky UK^isurcuicul is so liad and 1 lioroi'oix^ iho i)lu)l()UitUi\' is iiioaiiliigi(>ss. 





Suspicious object flags: 






DEBLEWDED_AS_PSF 


If the deblending algorithm found this child is unresolved. 


12.7 


TOO_FEW_GOODJ)ETECTIONS 


This object doesn't have detections with good centroid in all bands. 


38.7 


NGJETRO 


The code was not able to determine the Petrosian radius for this object. 


26.4 
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Table Bl. The sample covariance matrix for the full galaxy catalog, all values are in units of 10 ^. The full version and the 
matrices for galaxy catalogs in four magnitude bins axe available online in ASCII format. 



Angle 8.8101 6.8383 5.3078 4.1198 3.1978 2.4821 1.9265 1.4953 1.1607 0.9009 



8.8101 


0, 


,0094 


0, 


,0078 


0, 


,0081 


0.0088 


0.0096 


0, 


,0103 


0, 


,0109 


0, 


,0114 





.0124 


0.0138 


6.8383 


0, 


.0078 


0, 


,0094 


0, 


,0107 


0.0111 


0.0114 


0, 


,0117 


0, 


,0122 


0, 


,0122 


0, 


.0129 


0.0136 


5.3078 


0. 


.0081 


0, 


,0107 


0, 


,0146 


0.0155 


0.0153 


0, 


.0157 


0, 


,0164 


0, 


,0161 





.0169 


0.0181 


4.1198 


0, 


.0088 


0, 


,0111 


0, 


,0155 


0.0185 


0.0188 


0, 


.0191 


0, 


,0198 


0, 


,0192 





.0201 


0.0215 


3.1978 


0, 


.0096 


0, 


,0114 


0, 


,0153 


0.0188 


0.0209 


0, 


.0223 


0, 


,0237 


0, 


,0236 





.0254 


0.0272 


2.4821 


0. 


.0103 


0, 


,0117 


0, 


,0157 


0.0191 


0.0223 


0, 


.0262 


0, 


,0286 


0, 


,0291 





.0316 


0.0342 


1.9265 


0, 


.0109 


0, 


,0122 


0, 


,0164 


0.0198 


0.0237 


0, 


,0286 


0, 


,0322 


0, 


,0335 





.0367 


0.0397 


1.4953 


0, 


,0114 


0, 


,0122 


0, 


,0161 


0.0192 


0.0236 


0, 


,0291 


0, 


,0335 


0, 


,0368 





.0411 


0.0448 


1.1607 


0, 


,0121 


0, 


,0129 


0, 


,0169 


0.0201 


0.02,-54 





,0316 


0, 


,0367 


0, 


,0411 





,0472 


0.0-521 


0. <)()()<) 


0, 


,01:-!8 


0, 




0, 


,1)181 


0.021,'') 


0.0272 


0, 


,0:U2 


0, 


,():-!!)7 


0, 


.Oils 
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